Monocular Depth Estimation Network Based on Swin Transformer

نویسندگان

چکیده

Abstract Estimating depth from a single image is challenging because 2D may correspond to many different 3D scenes with the same depth. While most deep learning based prediction methods extract features using small convolutional kernels receptive fields, which results in deformed edges and inaccurate values of distant objects estimation results. Aiming at this problem, we propose network on Swin Transformer encoder-decoder structure. We construct encoder network, can encode long-range spatial dependency various scales across channels. The decoder proposed charge fusing by operations interpolation, concatenation, convolution. Experiments KITTI NYUv2 datasets show that our get more accurate than state-of-the-art methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Monocular Vision Based Relative Depth Estimation for Hand Gesture Recognition

In this paper, we focus on real-time relative hand depth estimation with cluttered backgrounds and variable illumination for a target application to hand pushing and pulling detection. The task is characterized by a lack of consistent internal contrast in the hand combined with the complex background. We adopt a detection-and-registration strategy to predict frame-toframe scaling factor and mak...

متن کامل

Towards Domain Independence for Learning-based Monocular Depth Estimation

Modern autonomous mobile robots require a strong understanding of their surroundings in order to safely operate in cluttered and dynamic environments. Monocular depth estimation offers a geometry-independent paradigm to detect free, navigable space with minimum space and power consumption. These represent highly desirable features, especially for micro aerial vehicles. In order to guarantee rob...

متن کامل

Depth Estimation Using Monocular and Stereo Cues

Depth estimation in computer vision and robotics is most commonly done via stereo vision (stereopsis), in which images from two cameras are used to triangulate and estimate distances. However, there are also numerous monocular visual cues— such as texture variations and gradients, defocus, color/haze, etc.—that have heretofore been little exploited in such systems. Some of these cues apply even...

متن کامل

Bayesian depth estimation from monocular natural images.

Estimating an accurate and naturalistic dense depth map from a single monocular photographic image is a difficult problem. Nevertheless, human observers have little difficulty understanding the depth structure implied by photographs. Two-dimensional (2D) images of the real-world environment contain significant statistical information regarding the three-dimensional (3D) structure of the world t...

متن کامل

Aperture Supervision for Monocular Depth Estimation

We present a novel method to train machine learning algorithms to estimate scene depths from a single image, by using the information provided by a camera’s aperture as supervision. Prior works use a depth sensor’s outputs or images of the same scene from alternate viewpoints as supervision, while our method instead uses images from the same viewpoint taken with a varying camera aperture. To en...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of physics

سال: 2023

ISSN: ['0022-3700', '1747-3721', '0368-3508', '1747-3713']

DOI: https://doi.org/10.1088/1742-6596/2428/1/012019